{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4d6f1074",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "If you would like to add a reaction to the databases which does not currently appear in the database you should follow these steps here. If you already have initial and final frames, you can avoid this step and instead please see https://github.com/FAIR-Chem/fairchem/blob/4164cc08695d8283714b07bbcb03ff2916a797c9/src/fairchem/applications/cattsunami/tutorial/fairchem_models_for_nebs.ipynb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c99c1149",
   "metadata": {},
   "source": [
    "## 1. Add adsorbates to the adsorbate dbs if necessary\n",
    "\n",
    "If the adsorbates you are interested in are also absent from the adsorbate databases you will have to add them as well. If the adsorbate is very dissimilar from those present in the OC20 database, be aware that you may have higher errors due to the adsorbate being out of domain. The validation dataset has shown that there is generalization to unseen adsorbates that are similar to those in domain."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82488e4c",
   "metadata": {},
   "source": [
    "### Are the adsorbates involved in your reaction already in the adsorbate db?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "5e38d684",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "from fairchem.data.oc.databases.pkls import ADSORBATE_PKL_PATH\n",
    "\n",
    "with open(ADSORBATE_PKL_PATH, \"rb\") as f:\n",
    "    adsorbates = pickle.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "1614b751",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 *O\n",
      "1 *H\n",
      "2 *OH\n",
      "3 *OH2\n",
      "4 *C\n",
      "5 *CO\n",
      "6 *CH\n",
      "7 *CHO\n",
      "8 *COH\n",
      "9 *CH2\n",
      "10 *CH2*O\n",
      "11 *CHOH\n",
      "12 *CH3\n",
      "13 *OCH3\n",
      "14 *CH2OH\n",
      "15 *CH4\n",
      "16 *OHCH3\n",
      "17 *C*C\n",
      "18 *CCO\n",
      "19 *CCH\n",
      "20 *CHCO\n",
      "21 *CCHO\n",
      "22 *COCHO\n",
      "23 *CCHOH\n",
      "24 *CCH2\n",
      "25 *CH*CH\n",
      "26 CH2*CO\n",
      "27 *CHCHO\n",
      "28 CH*COH\n",
      "29 *COCH2O\n",
      "30 *CHO*CHO\n",
      "31 *COHCHO\n",
      "32 *COHCOH\n",
      "33 *CCH3\n",
      "34 *CHCH2\n",
      "35 *COCH3\n",
      "36 *OCHCH2\n",
      "37 *COHCH2\n",
      "38 *CHCHOH\n",
      "39 *CCH2OH\n",
      "40 *CHOCHOH\n",
      "41 *COCH2OH\n",
      "42 *COHCHOH\n",
      "43 *CH2*CH2\n",
      "44 *OCHCH3\n",
      "45 *COHCH3\n",
      "46 *CHOHCH2\n",
      "47 *CHCH2OH\n",
      "48 *OCH2CHOH\n",
      "49 *CHOCH2OH\n",
      "50 *COHCH2OH\n",
      "51 *CHOHCHOH\n",
      "52 *CH2CH3\n",
      "53 *OCH2CH3\n",
      "54 *CHOHCH3\n",
      "55 *CH2CH2OH\n",
      "56 *CHOHCH2OH\n",
      "57 *OHCH2CH3\n",
      "58 *NH2N(CH3)2\n",
      "59 *ONN(CH3)2\n",
      "60 *OHNNCH3\n",
      "61 *NNCH3\n",
      "62 *ONH\n",
      "63 *NHNH\n",
      "64 *NH2NH2\n",
      "65 *N*NH\n",
      "66 *ONNO2\n",
      "67 *NO2NO2\n",
      "68 *N*NO\n",
      "69 *N2\n",
      "70 *ONNH2\n",
      "71 *NH2\n",
      "72 *NH3\n",
      "73 *NONH\n",
      "74 *NH\n",
      "75 *NO2\n",
      "76 *NO\n",
      "77 *N\n",
      "78 *NO3\n",
      "79 *OHNH2\n",
      "80 *ONOH\n",
      "81 *CN\n",
      "82 CO*COH\n",
      "83 *OCHO\n",
      "84 *COOH\n",
      "85 *OOH\n"
     ]
    }
   ],
   "source": [
    "for idx in range(len(adsorbates)):\n",
    "    print(idx, adsorbates[idx][1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e07acc2",
   "metadata": {},
   "source": [
    "### If not, you will have to add them.\n",
    "\n",
    "Each adsorbate entry is a tuple containing the following information:\n",
    "1. The adsorbate atoms object\n",
    "2. A string representing the adsorbate and its adsorbed atom(s)\n",
    "    - For monoatomic hydrogen we use `*H` where the asterisk appears just beform the binding atom(s)\n",
    "    - If an adsorbate is definitively multidentate, it appears with multiple asterisks (i.e. `*CH2*CH2`)\n",
    "    - This isnt super important so long as it is meaningful to you, because it is a method of querying the adsorbate from the db using the `Adsorbate` class\n",
    "3. A numpy array containing the binding index/indices\n",
    "    - This is the index of the atom or atoms which are expected to bind\n",
    "    - For `*H` and other monoatomic adsorbates it will always be 0, for others you will have to inspect the adsorbate and pick the correct index/indices\n",
    "4. A html string of the adsorption reaction with the OC20 referencing scheme to CO, H2, H2O, and N2 in the gas phase\n",
    "    - This is so the reaction may be rendered on the demo website and has no other use, so do not worry about it too much\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68108a51",
   "metadata": {},
   "source": [
    "## 2. Set up the basic reaction db elements\n",
    "All three reaction types have the same basic elements. The first few are very straight forward, and will be discussed here. Each db entry is a dictionary and the db is just a list of these dictionaries. Each dictionary contains.\n",
    "1. `reaction` a string representation of the reaction. This may be used to query the db using the `Reaction` class and is used to make the entries human readable. The exact content of it does not impact anything.\n",
    "2. `reaction_type` which is either desorption, dissociation, or transfer. This corresponds to what database the reaction should appear in\n",
    "3. The reactant and product indices. This is different for each of the reaction classes because they have differing numbers of reactant(s) and products(s)\n",
    "    - Desorptions: `reactant` and `product` are entries to the db dictionary\n",
    "    - Dissociation: `reactant`, `product1`, and `product2` are entries to the db dictionary\n",
    "    - Transfer: `reactant1`, `reactant2`, `product1`, and `product2` are entries to the db dictionary\n",
    "    \n",
    "Let's look at examples of these."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "a51b2703",
   "metadata": {},
   "outputs": [],
   "source": [
    "# you may have to change the paths\n",
    "DESORPTION_PATH = \"../databases/desorptions_9Aug23.pkl\"\n",
    "DISSOCIATION_PATH = \"../databases/dissociation_reactions_22May24.pkl\"\n",
    "TRANSFER_PATH = \"../databases/transfers_5Sept23.pkl\"\n",
    "\n",
    "with open(DESORPTION_PATH, \"rb\") as f:\n",
    "    desorptions = pickle.load(f)\n",
    "    \n",
    "with open(DISSOCIATION_PATH, \"rb\") as f:\n",
    "    dissociations = pickle.load(f)\n",
    "    \n",
    "with open(TRANSFER_PATH, \"rb\") as f:\n",
    "    transfers = pickle.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "f93f5b25",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'reaction': '*CO -> CO(g)',\n",
       " 'reaction_type': 'desorption',\n",
       " 'reactant': 5,\n",
       " 'product': 5,\n",
       " 'idx_mapping': [{0: 0, 1: 1}],\n",
       " 'edge_indices': [(0, 1)],\n",
       " 'broken_edge': []}"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "desorptions[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a7040ef",
   "metadata": {},
   "source": [
    "For desorptions, the reactant and product indices are redundant. As can be seen above, `*CO` has an index of 5 in the adsorbate db, and that is its index in the desorption db."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "dea2daef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'reaction': '*OH -> *O + *H',\n",
       " 'reaction_type': 'dissociation',\n",
       " 'reactant': 2,\n",
       " 'product1': 0,\n",
       " 'product2': 1,\n",
       " 'idx_mapping': [{0: 0, 1: 1}],\n",
       " 'edge_indices_initial': [(0, 1)],\n",
       " 'edge_indices_final': []}"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dissociations[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "548f869a",
   "metadata": {},
   "source": [
    "Product 1 is defined as sharing the same binding atom as the reactant, so in this case it must be `*O`. As can be seen above the indices listed for the reactants and products match those in the adsorbate db."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "3655eba3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'reaction': '*OH + *CH2 -> *O + *CH3',\n",
       " 'reaction_type': 'transfer',\n",
       " 'reactant1': 2,\n",
       " 'reactant2': 9,\n",
       " 'product1': 0,\n",
       " 'product2': 12,\n",
       " 'idx_mapping': [{0: 0, 1: 2, 2: 1, 3: 3, 4: 4},\n",
       "  {0: 0, 1: 3, 2: 1, 3: 2, 4: 4},\n",
       "  {0: 0, 1: 4, 2: 1, 3: 3, 4: 2}],\n",
       " 'edge_indices_initial': [(0, 1), (2, 3), (2, 4)],\n",
       " 'edge_indices_final': [(1, 2), (1, 3), (1, 4)]}"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transfers[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f2281d3",
   "metadata": {},
   "source": [
    "Product 1 must share the same binding atom as reactant 1 and product 2 must share the same binding atom as reactant 2. As can be seen above the indices listed for the reactants and products match those in the adsorbate db."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c861e38",
   "metadata": {},
   "source": [
    "### Complete this now for the reaction you are interested in"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d182936",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_entry = {\"reaction\": ,\n",
    "             \"reaction_type\": ,\n",
    "            }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46e94bce",
   "metadata": {},
   "source": [
    "## 3. Add edges which will be enforced\n",
    "\n",
    "As a part of the workflow to enumerate possible reactant and product configurations, we enforce that the bonds we expect are maintained and no erroneous additional bonds exist. The information in `edge_indices_initial` and `edge_indices_final` or `edge_indices` facillitate this. The functions below should automatically handle this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "8dea0cbb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import combinations\n",
    "from ase.data import atomic_numbers, covalent_radii\n",
    "import ase\n",
    "\n",
    "def get_edges(atoms: ase.Atoms):\n",
    "    \"\"\"\n",
    "    Get the edges for all atoms in an atoms object.\n",
    "\n",
    "    Args:\n",
    "        edge_list (list[tuples]): The edges\n",
    "    \"\"\"\n",
    "    edge_list = []\n",
    "    elements = atoms.get_chemical_symbols()\n",
    "    all_combos = list(combinations(range(len(atoms)), 2))\n",
    "    for combo in all_combos:\n",
    "        total_distance = atoms.get_distance(combo[0], combo[1], mic=True)\n",
    "        r1 = covalent_radii[atomic_numbers[elements[combo[0]]]]\n",
    "        r2 = covalent_radii[atomic_numbers[elements[combo[1]]]]\n",
    "        distance_ratio = total_distance / (r1 + r2)\n",
    "        if distance_ratio <= 1.25:\n",
    "            edge_list.append(tuple(combo))\n",
    "    return edge_list\n",
    "\n",
    "def add_edges(entry, adsorbate_db):\n",
    "    if entry[\"reaction_type\"] == \"desorption\":\n",
    "        adsorbate = adsorbate_db[entry[\"reactant\"]][0]\n",
    "        entry[\"edge_indices\"] = get_edges(adsorbate)\n",
    "        entry[\"broken_edge\"] = []\n",
    "        \n",
    "    elif entry[\"reaction_type\"] == \"dissociation\":\n",
    "        reactant = adsorbate_db[entry[\"reactant\"]][0]\n",
    "        entry[\"edge_indices_initial\"] = get_edges(reactant)\n",
    "        \n",
    "        pdt1 = adsorbate_db[entry[\"product1\"]][0].copy()\n",
    "        pdt2 = adsorbate_db[entry[\"product2\"]][0].copy()\n",
    "        pdt2.translate([99,99,99]) # just so they dont overlap when concatenated\n",
    "        \n",
    "        pdts = pdt1 + pdt2\n",
    "        entry[\"edge_indices_final\"] = get_edges(pdts)\n",
    "        \n",
    "    elif entry[\"reaction_type\"] == \"transfer\":\n",
    "        rxt1 = adsorbate_db[entry[\"reactant1\"]][0].copy()\n",
    "        rxt2 = adsorbate_db[entry[\"reactant2\"]][0].copy()\n",
    "        rxt2.translate([99,99,99])\n",
    "        rxts = rxt1 + rxt2\n",
    "        entry[\"edge_indices_initial\"] = get_edges(rxts)\n",
    "        \n",
    "        pdt1 = adsorbate_db[entry[\"product1\"]][0].copy()\n",
    "        pdt2 = adsorbate_db[entry[\"product2\"]][0].copy()\n",
    "        pdt2.translate([99,99,99])\n",
    "        \n",
    "        pdts = pdt1 + pdt2\n",
    "        entry[\"edge_indices_final\"] = get_edges(pdts)\n",
    "        \n",
    "    return entry"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c44f739d",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_entry = add_edges(new_entry, adsorbates)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60448510",
   "metadata": {},
   "source": [
    "## 4. Add the mapping scheme\n",
    "Lastly, we have to add a mapping so the ordering of the adsorbate atoms in the initial frame match the ordering in the final frame. In the case of desorptions, this is a trivial 1:1 mapping. For transfers and dissociations it can be more complicated, particularly when there are multiple possible leaving groups. As an example, for `*CH2 -> *CH + *H` there are 2 possible hydrogens which can leave. We want to pick the one that minimizes the total distance traversed by all atoms to get from the reactant state to the product state. This happend automatically in the AutoFrame classes, but to do so we have to allow the multiple possible mappings in the database file. It is possible that this may be done algorithmically, but that was not done for this work. If you devise a way, let us know. Let's walk through an example with 2 possible mappings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "f5874b50",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'reaction': '*CH2*O -> *CHO + *H',\n",
       " 'reaction_type': 'dissociation',\n",
       " 'reactant': 10,\n",
       " 'product1': 7,\n",
       " 'product2': 1,\n",
       " 'idx_mapping': [{0: 0, 1: 3, 2: 1, 3: 2}, {0: 0, 1: 1, 2: 3, 3: 2}],\n",
       " 'edge_indices_initial': [(0, 1), (0, 2), (0, 3)],\n",
       " 'edge_indices_final': [(0, 1), (0, 2)]}"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = dissociations[10]\n",
    "ex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "35f9eb82",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"1000px\"\n",
       "            height=\"500px\"\n",
       "            src=\"x3dase.html\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "            \n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x7f70a61db6d0>"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from x3dase.visualize import view_x3d_n\n",
    "\n",
    "view_x3d_n(adsorbates[ex[\"reactant\"]][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "5b36fb66",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"1000px\"\n",
       "            height=\"500px\"\n",
       "            src=\"x3dase.html\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "            \n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x7f70c6607310>"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pdt1 = adsorbates[ex[\"product1\"]][0].copy()\n",
    "pdt2 = adsorbates[ex[\"product2\"]][0].copy()\n",
    "pdt2.translate([2,2,2])\n",
    "\n",
    "view_x3d_n(pdt1 + pdt2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7af8f29c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The reactant is:\n",
    "#          H(1)\n",
    "#         /\n",
    "# O(3) = C(0)\n",
    "#         \\\n",
    "#          H(2)\n",
    "         \n",
    "# The mapping tells us the potential indices in the product state:\n",
    "\n",
    "#             H(3)                       H(1)\n",
    "#            /                          /\n",
    "#    O(2) = C(0)       or       O(2) = C(0)\n",
    "#            \\                         \\\n",
    "#            H(1)                      H(3)\n",
    "\n",
    "# This gives a mapping of:\n",
    "#             H(1:3)                           H(1:1)\n",
    "#            /                                /\n",
    "#    O(3:2) = C(0:0)       or       O(3:2) = C(0:0)\n",
    "#            \\                               \\\n",
    "#            H(2:1)                          H(2:3)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c417e8d",
   "metadata": {},
   "source": [
    "Therefore the mapping should be \n",
    "```\n",
    "[{0: 0, 1: 3, 2: 1, 3: 2}, {0: 0, 1: 1, 2: 3, 3: 2}]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e27291b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add your mapping and then add it to the appropriate file!\n",
    "\n",
    "new_entry[\"idx_mapping\"] = []\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
